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be  able  to  maintain  a  large  data  base  on  line  in  a  dedicated  computer. 

We  also  have  developed  preliminary  software  to  register  the  images  so 
that  rapid  judgements  can  be  made  about  the  number  of  spots  that  have 
equivalent  mobilities  on  any  two  images.  Software  is  also  availble  to 
use  data  for  many  such  binary  comparisons  to  perform  cluster  analysis 
to  determine  hierarchical  relationships  among  the  various  viral  strains. 
Considerable  progress  has  been  made  in  most  areas  of  the  project  . 

The  next  step  is  to  integrate  the  various  peices  into  a  unified  system. 
This  we  hope  to  accomplish  in  the  forthcoming  year. 

All  problems  have  not  yet  been  overcane.  Improvements  and  modifica¬ 
tions  are  needed  in  almost  all  important  software  areas.  Nevertheless 
we  feel  that  we  are  far  enough  along  with  the  development  work  that 
we  can  begin  the  production  phase  of  the  contract.  Although  consider¬ 
able  software  development  is  still  needed  our  most  serious  problems 
at  present  relate  to  hardware  limitations.  The  processing  time  re¬ 
quired  by  the  segmentation  program  is  far  too  extensive  to  allow  us 
to  follow  an  efficient  production  schedule.  On  the  VAX  computer 
which  will  support  the  data  base  is  a  shared  system.  As  the  data 
base  grows  we  will  face  increasing  difficulties  in  obtaining  enough 
memory  to  maintain  it  unline.  In  the  immediate  time  period  we  are 
suffering  significant  delays  due  to  competition  for  system  resources 
with  instructional  users.  This  problem  has  became  increasing  severe 
due  to  changes  in  the  University’s  philosophy  on  computing . 

Our  major  recommendation  is  that  we  attempt  to  perform  production 
work  in  the  coning  year  and  thereby  begin  establishing  an  integrated 
software  system.  In  order  to  accomplish  this  we  believe  it  is  essential 
to  upgrade  the  processor  on  the  minicomputer  that  is  used  for  the  seg¬ 
mentation  work  and  to  begin  examining  alternative  options  to  use  of  the 
campus  VAX  system  for  storing  the  data  base.  We  also  recommend  that 
some  input  regarding  useful  features  be  obtained  from  test  users 
during  this  second  year.  In  this  regard  installation  of  a  suitable 
graphics  terminal  at  the  CDC  laboratory  which  is  preparing  the  finger¬ 
prints  should  be  considered. 


SECURITY  CLASSIFICATION  OF  THIS  P  AGEflVion  Knfererf; 


VW'v'V* 


FORWARD 


Citations  of  commercial  organizations  and  trade  names  in  this  report  do  not 
constitute  an  official  Department  of  the  Army  endorsement  or  approval  of  the 
products  or  services  of  these  organizations. 


TABLE  UF  CONTENTS 

I.  Problem  Statement 

II.  Background 

III.  General  Approach 

IV.  Results  and  Discussion 
A.  Segmentation 

6.  Image  Rectification 

C.  Marker  uligonucleotides 

D.  Data  Base 

E.  Data  Analysis 

V.  Conclusions  and  Recommendations 


PROBLEM  STATEMENT: 


The  assigned  task  is  to  develop  an  automated  system  for  analyzing  newly  deter¬ 
mined  dengue  virus  fingerprints  in  the  context  of  an  on  line  library  of  pre¬ 
viously  existing  fingerprints.  It  is  intended  that  such  an  analysis  will  allow 
a  rapid  identification  of  previously  examined  isolates  that  most  closely 
resemole  any  new  isolate.  It  is  also  hoped  that  correlations  of  various 
features  of  the  fingerprints  can  De  made  with  epidemiologically  significant  pro¬ 
perties  of  the  various  isolates. 

BACKGROUND: 

Dengue  viruses,  the  etiologic  agents  of  ,,classical,,  dengue  fever  and  denrue 
hemorrhagic  fever/dengue  shock  syndrome  (.DHF/DSS)  are  distributed  over  tropical 
and  subtropical  regions  of  all  five  continents  (1J.  In  urban  settings  the  vi¬ 
ruses  are  transmitted  to  man  principally  by  Aedes  aegypti  mosquitoes  and  the 
resourcefulness  of  this  vector  in  adapting  to  tropical  urban  environments  i? 
oelieved  to  be  a  major  factor  contributing  to  a  dramatic  increase  during  recent 
years  in  DhS/DSS  (2). 

Half  of  the  world’s  population  resides  in  areas  where  dengue  virus  is  ende¬ 
mic  and  350  million  people  live  in  areas  where  there  have  been  DHF/DSS  epidemics 
curing  the  last  decade.  In  Thailand  in  1977,  DHF/DSS  was  the  second  leading 
cause  for  hospitalization  of  children  and  the  leading  cause  of  death  due  to  com¬ 
municable  diseases  at  any  age.  Dengue  viruses  play  a  prominent  role  in 
pediatric  morbidity  and  mortality  in  several  major  tropical  population  areas  of 
the  world  (2). 

Although  the  statistical  data  accumulated  during  the  past  two  decades 
suggest  that  dengue  viruses  are  an  increasing  public  health  problem,  many  basic 


questions  concerning  their  epidemiology  and  pathogenesis  remain  unanswered,  why 
do  certain  populations  contract  severe  forms  of  dengue  disease  and  others  only 
“classical"  aengue  fever?  rne  "second-infection"  hypothesis  for  DHF/053  (i.e., 
sequential  infections  with  heterologous  dengue  serotypes J,  initially  proposed  ty 
Halstead  et  al.  t'JJ,  is  supported  by  a  variety  of  epidemiological  studies  [cf. 
nalstead  for  a  review).  _In  vivo  studies  probing  tne  mechani sms  involved  in 
this  hypothesis  nave  given  rise  to  the  immune  enhancement  theory  [4). 
Alternative  explanations  for  the  OHF/OSS  syndrome  have  been  put  forward  by 
Hanmon  [i>J  and  Rosen  [6J.  Essentially,  these  workers  argue  for  virus-specific 
virulence  factors  as  the  source  of  different  responses  by  different  populations 
to  dengue  vires  epidemics  (i.e.  "classical"  vs.  DHF/DSS  disease).  However,  few 
data  describing  genetic  markers  which  differentiate  viruses  within  a  given  sero¬ 
type  are  available,  and  the  choice  or  reconci lation  between  these  two  alter¬ 
natives  must  await  such  data. 

Although  serological  methods,  haemagglutination  inhibition  [HI),  complement 
fixation  (CF )  and  plaque  reduction  neutralization  tests  [PRNt),  subdivide  dengue 
viruses  into  four  serotypes  [7),  they  are  not  sufficiently  sensitive  to 
distinguish  among  members  of  a  given  serotype.  With  a  method  of  sufficient  sen¬ 
sitivity,  it  might  be  possible  to  differentiate  a  virus  isolate  which  has  the 
potential  for  causing  DHF/uSS  from  one  which  ooes  not  have  this  capability. 

The  methoa  of  oligonuc ieotiae  fingerprint  mapping  yields  distinct  patterns 
for  different  members  of  a  given  virus  family  [8-12).  The  oligonucleotide  maps 
from  strains  of  La  Crosse  virus  [LAC)  with  no  detectable  difference  in  serologi¬ 
cal  characteristics  range  from  [a)  identity  to  [b)  minor  but  clearly 
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distinguishable  differences  to  (cj  fingerprints  with  no  apparent  relationship  to 
one  another  (10).  Thus  oligonucleotide  fingerprints  are  a  very  sensitive  method 
for  measuring  relationships  between  viruses  which  are  indistinguishable  by  other 
criteria.  For  example,  the  method  has  ben  used  to  develop  a  dendrogram  of  the 
genetic  relationships  between  various  isolates  of  Lacrosse  viruses  from  the 
western  u.S.  (llj. 

Tnese  stuaies  suggest  that  the  methoo  has  sufficient  sensitivity  to  quan¬ 
titatively  measure  small  differences  between  dengue  virus  isolates  from  around 
the  world.  An  initial  study  of  four  prototype  strains  of  dengue  virus,  repre¬ 
senting  each  of  the  serotypes,  showed  that  less  than  10%  of  their  unique  oligo¬ 
nucleotides  v,ere  identical  (13J.  These  same  investigators  then  proceeded  to 
show  that  oligonucleotide  maps  from  dengue  viruses  within  a  given  serological 
type  (DEN 1 )  showed  varying  degrees  of  relatedness  as  a  function  of  how  close  in 
time  or  geographic  location  they  had  been  isolated. 

Isolates  from  within  a  given  geographical  area  (Africa,  the  Cariboean  and 
Pacific/5. t.  Area)  oemonstrated  45%  to  100%  homology  in  their  unique  oligo¬ 
nucleotides  (i.e.,  >  10  bat *s  longj  whi1?  only  20%  to  30%  of  the  unique  oligo¬ 
nucleotides  v  .re  identical  in  viruses  from  gifferent  geographical  areas. 
Comparison  of  the  patterns  from  a  DtNl  virus,  first  detected  in  the  Cariboean  in 
i577,  with  DENl  isolates  from  other  regions  of  the  worio  Demonstrated  the 
highest  homology  (50%  to  60%)  between  the  Caribbean  isolate  and  isolates  from 
Sri  Lanka  and  Nigeria  suggesting  that  the  new  Caribbean  virus  came  originally 
from  one  or  the  other  of  these  areas  (13).  More  recently  a  detailed  study  of 
Dengue  2  strains  (14 j  showed  that  geographically  isolated  and  epidemiologically 
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unrelated  viruses  had  very  distinct  fingerprints  whereas  strains  that  were 
related  shared  80-55*  cf  their  large  oligonucleotides. 

The  work  on  DEhl  (.13),  0EN2  (14)  and  LAC  viruses  (10)  has  established  that 
the  method  can  distinguish  arbovirus  isolates  from  different  geographic  areas. 
It  is  therefore  of  great  value  in  epidemiological  studies  on  dengue  viruses. 
However,  actual  sequencing  of  the  individual  oligonucleotides  on  a  particular 
RhA  fingerprint  is  tedious  and  time  consuming.  The  usual  approach  has  thus  been 
for  individual  investigators  to  visually  compare  the  spatial  location  of  spots 
between  various  fingerprints.  This  becomes  increasingly  time  consuming  as  the 
number  of  binary  comparisons  increases  and  is  subject  to  inevitable  variations 
caused  by  different  interpretations  made  by  individual  scientists.  (he  entire 
procedure  would  be  greatly  facilitated  by  the  availability  of  a  computer-based 
hardware  and  software  system  which  could  rapialy  analyze  the  data  in  an  accurate 
uniform  manner.  In  addition,  this  system  could  tiring  all  the  fingerprint,  out  u 
together  into  a  single  site  that  would  be  accessible  to  all  interested  investi¬ 
gators  in  the  field.  Development  of  such  a  computer-based  system  is  the  major 
goal  of  the  work  being  conducted  for  this  contract. 

CuLLAbORATIVE  ASPECTS  DF  THE  WORK: 

The  work  is  being  performed  by  two  groups  on  separate  contracts.  A  group 
under  the  direction  of  Dr.  Dennis  Trent  at  the  Centers  for  Disease  Control 
Laboratories  in  Fort  Collins,  Colorado  is  responsible  for  preparation  of  viral 
RNA  and  the  fingerprints.  The  University  of  Houston  group  is  responsible  for 
preparation  of  the  software  and  maintenance  of  the  data  base  once  it  is 
established.  This  latter  work  is  described  in  this  report. 


i 


5 


GENERAL  APPROACH: 

The  intent  is  to  reduce  each  fingerprint  to  a  set  of  feature  file;  that  c.ui 
be  reaaily  stored  in  an  on  line  data  base.  These  feature  files  will  contain 
sufficient  information  to  allow  in  most  instances  the  comparison  of  any  new  iso¬ 
late  of  dengue  fever  virus  with  previously  examined  isolat.  3  witnout  resorting 
to  reexamination  of  the  fingerprint  of  the  earlier  isolate.  Trie  information 
contained  in  these  feature  files  should  allow  construction  of  a  display  for  any 
known  isolate  that  bears  close  resemblance  to  the  original  fingerprint.  These 
feature  files  must  be  sufficiently  simplistic  that  they  do  not  require  prohibi¬ 
tive  amounts  of  storage.  Once  this  data  base  is  established  it  is  necessary  to 
develop  software  to  compare  indiviuual  fingerprints  to  one  another  and  to  ana¬ 
lyze  trends  in  the  data  base  as  a  whole. 

uur  approach  is  to  use  modern  linage  processing  technology  to  establish  the 
requisite  oata  case.  The  first  step  is  to  digitize  the  original  x-ray  films 
produced  oy  the  fingerprinting  procedure.  This  is  currently  done  by  scanning 
the  image  with  a  vidicoin  camera  equipped  with  either  a  Cannon  Ty-lA  25mm  lens  or 
a  Cannon  TV-16  13mm  wide  angle  lens.  The  camera  is  interfaced  to  a  Spatial  Data 
c yeCom  III  image  analysis  system  which  gives  a  video  output  to  16  bits  with  266 
gray  levels  ranging  from  0  (.white!  tc  255  (black).  Tne  resulting  image  consists 
of  a  6mU  x  Abu  pix.  1  array  stored  on  a  DEC  POP  11/23  computer. 

Tt'"1  digitized  image  is  subsequently  segmented  to  separate  The  oligo¬ 
nucleotide  spots  from  the  background.  The  current  system  for  doing  this  examines 
the  numerical  second  derivative  of  the  image  in  order  to  find  core  spots  which 
are  subsequently  propagated  (15).  The  segmented  image  is  the  key  to  obtaining  a 
manageable  data  base  and  is  thus  the  crucial  software  development  step. 


-ANiVoa  \  j  *.  •  V.  A  v  .S V.  V.  c. 
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From  the  segmented  image  a  feature  file  is  to  be  obtained  which  allows 
rapid  construction  of  a  facsimile  of  the  original  image.  At  present  we  believe 
that  three  features  will  accomplish  this.  These  are  the  set  of  pixels  that 
define  the  boundary  of  the  spot  in  the  segmentation,  the  total  optical  density 
found  in  the  spot,  and  the  location  of  any  optical  density  maximums. 
Subsequent  intercomparison  of  any  two  images  is  accomplished  by  a  registration 
program  which  utilizes  a  second  or  third  order  bi-linear  transformation  to 
intercompare  images.  This  program  requires  the  choice  of  appropriate  spots 
which  are  clearly  equivalent  to  define  the  geometric  warping  or  in  the  case  of 
quite  dissimilar  fingerprints  the  inclusion  of  external  markers. 

The  registered  images  can  he  examined  visually  to  determine  how  many  spots 
have  the  same  mobility  and  in  this  way  a  quantitative  measure  of  the  Degree  of 
similarity  between  all  combinations  of  isolates  can  be  calculated.  These  indi- 
vidual  measures  of  similarity  are  used  as  input  data  to  cluster  analysis 
programs  which  construct  dendrograms  that  show  the  relationships  between  the 
individual  isolates.  These  same  dendrograms  also  will  provide  key  information 
to  efficient  arrangement  of  the  strains  in  the  data  base. 

RESULTS  and  DISCUSSION  of  RESULTS: 

During  the  first  year  we  have  largely  developed  most  of  the  components  of 
the  system  described  above.  It  remains  to  solve  residual  problems,  establish 
the  data  base  and  to  generally  integrate  the  software  together.  In  this  section 
we  will  review  progress  in  several  key  areas  in  more  detail. 
A.  Segmentation: 

Segmentat;on  of  the  images  is  the  process  by  which  the  computer  decides 
which  pixels  are  actually  included  in  spots  and  which  are  not.  A  successful 


segmentation  algorithm  is  essential  to  automate  the  location  of  spots,  to  define 
those  pixels  which  should  be  included  in  the  calculation  of  key  parameters  such 
as  total  spot  intensity  and  spot  center,  and  to  define  a  boundary  for  use  in  the 
data  base  in  order  to  allow  display  of  a  good  facsimile  of  the  original  image. 
The  success  of  the  project  rests  to  a  large  extent  on  obtaining  a  good  segmen¬ 
tation  and  for  this  reason  this  aspect  has  received  the  most  attention  during 
the  first  year. 

The  thing  that  makes  the  segmentation  problem  difficult  is  that  despite 
their  appearance  to  the  human  eye,  actual  spots  blend  quite  gradually  into  the 
background  and  do  not  have  a  well  defined  edge.  In  order  to  circumvent  this  it 
is  usual  to  attempt  to  enhance  the  edge  of  the  spot  in  some  way.  We  first 
attempted  to  do  this  with  an  algorithm  using  a  numerical  first  derivative  for 
which  we  had  existing  source  code.  This  approach  was  successful  in  finding  many 
spots  but  generally  failed  because  intensity  levels  on  the  images  are  not  uni¬ 
form.  This  is  the  result  of  single  spots  which  contain  either  several  copies  of 
the  same  oligomer  or  several  oligomers  with  essentially  identical  composition. 

After  several  attempts  to  improve  this  algorithm  we  decided  that  the  best 
strategy  would  be  to  adapt  an  algorithm  which  has  been  successfully  used  in  the 
analysis  of  protein  electrophoresis  gels  (15).  Because  this  algorithm  was  only 
available  in  computer  language  (SAIL-Stanford  Artificial  Intelligence  Language J 
that  cannot  be  run  on  either  the  VAX  or  PDP-11/23  it  was  necessary  to 
laboriously  convert  it  to  FORTRAN.  This  conversion  was  done  on  a  VAX  11/780  as 
well  as  the  POP  11/23  system  on  which  the  actual  processing  will  be  conducted 
and  operational  code  currently  exists  for  both  machines.  The  VAX  version  has 
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been  very  useful  in  debugging  but  will  not  be  used  in  the  actual  processing  uue 
to  limited  access.  The  invested  time  has  proven  fruitful  in  that  we  now  have  an 
essentially  operational  segmentation  algorithm. 

The  most  serious  difficulty  now  confronting  us  with  the  segmentation  is 
computation  time.  In  order  to  set  up  the  data  base  we  will  have  to  process  the 
existing  oacklog  of  fingerprints  from  the  earlier  studies  as  well  as  the  new 
fingerprints  from  the  current  study.  In  addition  as  refinements  are  added  to 
our  package  it  will  inevitably  be  necessary  to  reprocess  all  or  some  of  the 
images  again.  While  we  are  doing  this  catching  up  we  will  also  have  to  process 
the  many  new  images  being  produced  by  the  CDC  group  which  is  now  in  the  produc¬ 
tion  phase.  Our  POP  11/23  currently  requires  approximately  four  hours  to  fully 
process  a  single  full  resolution  image  (a  considerable  improvement  over  the  16 
hours  that  was  required  when  the  algorithm  was  first  implemented!.  We  have  been 
unable  to  find  any  way  of  further  accelerating  the  calculations  and  thus  have 
explored  other  options.  In  particular  we  have  purchased  hardware  to  allow 
remote  communication  with  the  machine  so  that  the  segmentation  algorithm  can  be 
run  at  night.  We  have  also  explored  a  possible  upgrade  of  the  system  to  the 
newer  PUP  11/73  configuration.  Since  calculational  speed  is  the  primary 
limiting  factor  rather  than  input/output  operations  we  can  expect  a  50-70*> 
improvement  if  we  are  able  to  make  this  upgrade. 

Other  minor  difficulties  also  remain  with  the  segmentation.  We  still  have 
problems  at  full  resolution  due  to  local  fluctuations  in  the  second  derivative. 
These  can  be  alleviated  by  improving  the  quality  of  the  histogram  obtained 
during  segmentation  and  by  modifying  the  neighborhood  included  in  the  second 
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derivative  calculation  from  the  current  5  x  5  to  7  x  7.  The  algorithm  also  is 
occasionally  missing  pieces  of  spots  or  generating  extra  spots.  In  part  these 
prodlems  can  be  resolved  by  more  proficient  use  of  the  built-in  selection  cri¬ 
teria.  The  problems  that  remain  can  likely  be  traced  to  a  less  than  optimal 
weighting  of  the  5x5  neighborhood  used  in  calculating  the  second  derivative. 
In  the  long  run  these  difficulties  will  be  resolved.  In  the  short  run  we  intend 
to  temporarily  finesse  them  by  adding  some  simple  editing  functions  to  allow 
minor  operator  directed  revisions. 

8.  Image  Rectification: 

It  is  important  to  facilitate  image  comparison.  In  order  to  do  this  one 
would  like  to  eliminate  the  inevitable  variations  that  occur  from  one  run  to  the 
next  so  that  images  could  be  overlapped  and  spots  of  identical  mobility  iden¬ 
tified.  To  effect  such  an  inter-comparison  of  images  a  group  of  register  points 
common  to  hoth  images  is  selected.  The  register  points  of  one  image  are  mapped 
onto  the  plane  formed  by  the  register  points  of  the  second  image.  This  amounts 
to  fixing  the  plane  of  one  image  and  then  warping  the  plane  of  the  other  so  that 
both  sets  of  register  points  overlap  relative  to  an  observor  orthogonal  to  the 
fixed  plane.  The  technique  for  determining  the  transformation  equations  is 
known  as  polynomial  warping  and  it  employs  a  bivariant  polynomial  to  effect  the 
transformation. 

we  have  implemented  and  tested  this  approach  for  both  the  second  and  third 
order  cases.  The  second  order  transformation  can  be  locally  quite  good  but  is 
generally  not  adequate  to  superimpose  two  images  at  high  resolution.  The  third 
order  algorithm  with  some  control  point  choices  gave  almost  perfect  super- 


positions.  A  different  selection  of  valid  register  points  can  give  results  that 
are  even  worse  than  the  second  order  case.  We  do  not  at  present  know  the  origin 
of  this  difficulty.  In  the  initial  phases  of  the  production  work  it  is  likely 
that  we  will  have  to  use  significant  operator  intervention  with  these  transfor¬ 
mation  routines  and  perhaps  do  the  transformation  in  a  series  of  local  regions. 
This  will  be  greatly  facilitated  by  the  display  capabilities  we  are  developing 
as  part  of  the  data  base  (section  DJ 
C.  Marker  Oligonucleotides: 

One  of  our  goals  was  to  accomplish  rectification  of  images  from  rather 
distantly  related  strains.  We  had  hoped  that  a  standard  set  of  marker  oligo¬ 
nucleotides  cculd  be  developed  to  accomplish  this,  preferably  in  a  double  label 
format  where  the  marker  would  only  be  present  on  one  of  two  images  and  thus 
would  not  obscure  information.  The  COC  group  has  worked  very  hard  on  this 
problem,  and  we  have  had  frequent  interactions  regarding  this.  It  is  now 
possible  to  make  markers,  and  to  predict  closely  where  they  will  go.  The 
double  labeling  idea  has  proven  to  be  very  untenable,  and  as  a  result  it  is 
necessary  that  the  marker  to  RNA  ratio  must  be  tightly  controlled  if  the  markers 
are  to  be  seen  without  obscuring  real  spots.  This  too  is  not  possible  without 
over  complicating  the  experimental  procedures. 

we  nevertheless  believe  that  we  can  stil  accomplish  our  goal  by  a  prototype 
pattern  approach.  As  the  data  accumulates  various  obvious  groups  become 
apparent.  Once  a  significant  number  of  any  type  are  encountered  one  isolate  can 
be  selected  as  a  prototype  pattern.  A  subset  of  a  standard  marker  set  (i.e.  we 
will  exclude  any  marker  that  will  be  obscuring  in  that  particular  strain)  will 


be  included  with  a  second  batch  of  the  RNA  and  a  new  fingerprint  produced.  This 
will  allow  matching  with  other  prototype  patterns.  Patterns  within  the  group 
will  be  easily  matched  to  the  prototype  without  markers  due  to  the  very  high 
similarity.  The  presence  of  markers  on  the  prototype  patterns  will  also  allow  a 
more  uniform  definition  of  the  spots  to  be  considered  during  any  individual 
binary  comparison.  It  would  also  be  very  useful  if  the  COC  group  were  able  to 
supply  sequence  data  for  some  or  all  of  the  spots  on  the  prototype  pattern. 
Indeed  the  availability  of  complete  data  of  this  sort  would  alleviate  much  if 
not  all  of  the  need  for  markers. 

D.  Data  Base: 

Recently  we  have  begun  construction  of  the  data  base  itself.  The  immediate 
concern  is  of  course  storage  space.  How  can  one  maintain  a  large  number  of  64U 
x  480  pixel  images  on  line  without  overwhelming  the  storage  capabilities  of  even 
a  large  minicomputer  such  as  the  VAX  11/780.  The  answer  is  of  course  that  you 
can’t.  Instead  our  goal  is  to  generate  a  high  quality  facsimile  of  each 
fingerprint.  By  plotting  the  boundary  pixels  of  each  spot  as  determined  during 
the  segmentation  we  are  able  to  produce  a  good  likeness  of  the  original  image. 
The  quality  of  this  image  can  be  further  enhanced  by  shading  the  interior,  of 
each  spot  according  to  its  calculated  average  optical  density. 

At  first  sight  it  would  appear  that  even  the  storage  of  a  set  of  coor¬ 
dinates  for  each  boundary  pixel  would  soon  become  a  problem.  However  if  one 
knows  the  location  of  an  initial  point  on  the  boundary  the  next  point  can  be 
found  without  resort  to  knowledge  of  its  coordinates  by  simply  specifying  which 
of  seven  possible  directions  it  is  relative  to  the  previous  point.  This  works 


because  by  definition  the  boundary  pixels  immediately  adjoin  one  another  and  so 
each  one  must  always  be  one  pixel  away  from  its  predecessor  in  some  direction. 
This  approach  will  require  preprocessing  of  the  coordinate  data  to  define  the 
direction.  The  net  effect  is  that  a  small  amount  of  calculational  time  will  be 
exchanged  for  much  more  efficient  storage  of  the  data  in  the  data  base.  This 
efficiency  of  storage  will  be  further  augmented  by  storing  two  such  directions 
in  each  computer  word.  The  net  effect  is  an  additional  70%  reduction  in  the 
required  storage  space.  We  now  estimate  that  2C-25  facsimile  images  will 
require  one  megabyte  of  storage.  On  a  dedicated  VAX  system  with  a  460  megabyte 
hard  disk  almost  9000  images  could  be  handled.  In  our  shared  VAX  environment 
however  storage  will  quickly  become  a  problem,  and  we  may  only  be  able  to  main¬ 
tain  a  prototype  data  base  on  line  unless  this  problem  is  resolved. 

we  have  succeeded  in  displaying  such  facsimile  images,  and  we  have  imple¬ 
mented  the  storage  efficiencies  described  above.  In  addition  we  have  developed 
prototypes  for  several  useful  display  functions.  For  example  windowing  on  our 
Lundy  T-5684  graphics  terminal  will  allow  simultaneous  display  of  six  1/4  reso¬ 
lution  images  and  eventually  pairs  of  images  as  compared  by  the  transformation 
algorithm.  Indeed  we  expect  to  be  able  to  move  one  image  relative  to  another 
following  the  transformation  to  allow  the  local  refinement  in  registration  that 
may  be  needed  to  facilitate  decision  making. 

with  the  availability  of  the  facsimile  images  we  have  also  begun  designing 
the  retrieval  portion  of  the  data  base.  The  important  consideration  here  is 
efficiency  in  the  recovery  of  any  desired  data.  Our  intention  is  to  begin  with 
a  hierarchical  system  as  it  is  likely  in  the  present  application  that  data  from 


13 


sets  of  closely  related  strains  will  frequently  be  simultaneously  retrieved.  We 
also  intend  to  begin  exploring  the  extent  to  which  graphics  capabilities  can  be 
provided  to  a  remote  user  of  the  data  base  through  a  personal  computer.  A  DEC 
Rainbow  100+  is  available  to  us,  and  thus  will  be  used  for  this  purpose.  If 
this  proves  feasible  we  might  at  a  later  time  attempt  to  extend  support  to  other 
models. 

E .  Data  Analysis: 

It  will  be  essential  to  prepare  a  variety  of  application  programs  to 
interact  with  the  data  base.  We  have  an  existing  16S  rRNA  oligonucleotide  cata¬ 
log  data  base  (16)  which  can  be  used  to  maintain  any  oligonucleotide  sequence 
data  the  COC  group  generates.  More  importantly  this  existing  package  contains 
several  programs  that  can  be  readily  transferred  to  the  VAX  for  use  in  analysis 
of  the  image  data.  Especially  pertinent  here  is  an  average  linkage  clustering 
program  that  can  be  used  to  produce  dendrograms  and  an  associated  program  which 
will  highlight  lines  on  a  dendrogram  to  readily  display  those  which  carry  a  par¬ 
ticular  attribute  such  as  the  presence  of  a  particular  spot  or  a  common  region 
of  origin. 

CONCLUSIONS  AND  RECOMMENDATIONS: 

After  one  year  very  significant  progress  has  been  made.  Essentially  all 
the  components  of  the  required  software  system  are  now  available.  In  several 
places  such  as  registration  of  images  the  approaches  are  still  a  bit  crude.  In 
other  areas  such  as  segmentation  what  remains  to  be  done  is  largely  refinement 
though  considerable  effort  will  be  required  to  accomplish  the  needed  improve¬ 
ments.  The  not  inconsiderable  remaining  task  is  to  integrate  these  pieces 


together.  In  order  to  accomplish  this  we  believe  the  best  approach  is  to  enter 
the  fray  and  see  which  problems  really  are  the  most  central.  We  have  thus 
agreed  with  the  CDC  group  that  they  should  discontinue  most  of  their  developmen¬ 
tal  efforts  and  begin  full  scale  production  of  fingerprints. 

Independent  of  our  remaining  developmental  work  we  see  several  problems  on 
the  horizon.  Most  pressing  is  the  need  to  speed  up  the  segmentation  procedure. 
Thin  can  be  accomplished  very  effectively  by  upgrading  the  present  POP  11/23 
system  to  the  new  POP  11/73  configuration  which  will  give  almost  a  four-fold 
improvement  in  processing  power  at  a  very  reasonable  co.-.  t. 

Further  down  the  horizon  we  see  increasing  difficulty  in  obtaining  adequate 
access  to  the  campus  VAX  11/780.  Although  we  have  found  a  very  efficient  method 
for  storing  facsimile  fingerprints  it  is  inevitable  that  we  will  exceed  the 
relatively  meager  alotment  of  on  line  storage  that  the  University  will  provide 
free.  In  the  long  run  we  will  have  to  either  pay  for  the  space  on  a  monthly 
basis  or  purchase  an  additional  disk  drive  for  the  system.  Even  more 
threatening  is  the  matter  of  processor  time  during  class  periods.  Due  to  recent 
internal  decisions  to  establish  a  ‘'computer  intensive  environment'1  on  this  cam¬ 
pus  the  educational  usage  of  the  VAX  systems  has  gone  way  up.  we  anticipate 
that  in  order  to  effectively  support  the  significant  increase  in  demand  on 
system  resources  brought  on  by  the  production  phase  of  the  project  we  will 
require  access  to  a  small  but  dedicated  VAX  system  by  the  middle  second  or  early 
third  year. 


With  the  onset  of  the  production  phase  of  the  work  we  would  strongly  recom¬ 
mend  that  a  stronger  tie  be  established  between  those  who  wish  to  use  the  data 


base  and  those  that  are  developing  it.  Our  group  lacks  real  expertise  with  the 
epidemiological  aspects  of  the  problem  and  would  profit  greatly  by  input  from 
test  users  of  the  system  regarding  additional  useful  features  and  improvements 
in  ease  of  use.  In  addition  such  an  interface  would  greatly  facilitate  develop¬ 
ment  of  documentation.  we  feel  that  this  can  be  effectively  achieved  by 
equipping  the  COC  group  with  an  appropriate  graphics  device  for  accessing  the 
image  data  base. 
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